Improving out-of-vocabulary name resolution

نویسندگان

  • David D. Palmer
  • Mari Ostendorf
چکیده

This paper presents algorithms for generating targeted name lists for candidate out-of-vocabulary (OOV) words for applications in language processing, particularly speech recognition. Focusing on names, which are shown to be the dominant class of OOVs in news broadcasts, the approach involves offline generation of a large name list and online pruning based on a phonetic distance. The resulting list can be used in a rescoring pass in automatic speech recognition. We also show that a simple variation of the approach can be used to generate alternate name spellings which may be useful for query expansion in information retrieval. By using a wide variety of sources, including automatic name phrase tagging of temporally relevant news text, OOV coverage can be improved by nearly a factor of two with only a 10% increase in the word list size. For one source, coverage increased from 13% to 94%. Phonetic pruning can be used to reduce the list size by an order of magnitude with only a small loss in coverage.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

About improving recognition of spontaneously uttered French city-names

This paper deals with the recognition of French city-names over the telephone. This recognition task, critical in many applications, involves a 40,000 city-name vocabulary, ranging from short monosyllabic words to long official compoundnames. Data collected from a field experiment are analyzed, and several ways of improving speech recognition performance are investigated. This includes a carefu...

متن کامل

Arabic - to - English Translation for IWSLT 2006

We present techniques for improving domainspecific translation quality with a relatively high OOV ratio on test data sets. The key idea is to maximize the vocabulary coverage without degrading the translation quality. We maximize vocabulary coverage by segmenting a word into a sequence of morphemes, prefix*-stem-suffix* and by adding a large amount of out-of-domain training corpora. To preserve...

متن کامل

Placename Ambiguity Resolution

It is common for placenames to reference other named entities (e.g. names of people, names of organizations, etc.) and to be used as vocabulary words (e.g. city of Split). Apart from reference ambiguity, placenames are faced with the problem of referent ambiguity (i.e., a placename referring to multiple places). Many places are also referred to by multiple names (e.g. Netherlands vs. Holland). ...

متن کامل

Geographical Scope Resolution

It is common for placenames to reference other named entities (e.g., names of people, names of organizations, etc.) and to be used as vocabulary words (e.g., city of Split). Apart from reference ambiguity, placenames are faced with the problem of referent ambiguity (i.e., a placename referring to multiple places). Many places are also referred to by multiple names (e.g., Netherlands vs. Holland...

متن کامل

The Relationship between Iranian Upper-Intermediate EFL Learners’ Contrastive Lexical Competence and Their Use of Vocabulary Learning Strategies

Regarding the vital role of lexical competence as an important requisite for the attainment of full mastery of the four language skills, this study tried to investigate the relationship between Iranian EFL learners’ contrastive lexical competence and their use of vocabulary learning strategies. To fulfil this objective, 60 Iranian upper-intermediate male and female language learners were select...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2005